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ABSTRACT 



The design, implementation, and evaluation of an 
adaptive scheduling alaorithm for the MU NIX operating system 
is reported here* MU NIX, a multiprocessing version of UNIX, 
was designed to run on a dual PDF 11/ 45 multiprocessor 
system. Topics covered include: a survey of adaptive 
scheduling, laboratory equipment configuration, scheduling 
with MU NIX, benchmark testing, and non-ad a ptive scheduling 
changes. Conclusions and suggestions for possible 
improvements are also included. 
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I . INTRODUCTION 



In the fall of 197 H , the Computer Science Group at the 
Naval Postgraduate School acaui red a fairly larqe amount of 
computer hardware and a limited amount of software. The 
intent of acqui st ion was to integrate the hardware and 
software to support signal processing research. The hardware 
consisted of two PDP 11/50 computers m a a e by Digital 
Equipment Corporation* one CSP 30 processor made by Computer 
Signal Processors Incorporated* and various associated 
peripherals described in section II. B . 2 and section II. B. 3 
(see figure 3). 

An agreement with Bell Laboratories Provided the 
software consisting of an operating system called UNIX [151. 
UNIX* as delivered* did not have the capability to fully 
utilize all the system resources necessary to support signal 
processing. As a result* several research projects wore done 
in this area. I/UNIX [71* a multiprocessing version of UNIX* 
was one of the projects done. Note that where the word MIJNIX 
is used in this thesis* UNIX may be substituted. Ihe 
chanqes made to MUNI X may easily be incorporated into UNIX. 

One of the goals of the computer system was to handle 
real-time* t i rr o s h a r o d * an J hatch nrocfssinn I 1 t } 1 • It was 
found that the scheduling algorithm used i n UNIX couln he 



improved for the equipment configuration being used. An 



excessive amount of time was being wasted swapping users in 
and out of core. This wasted time was a function of both the 
scheduling algorithm and the amount of available core. 
Figure 1 shows the amount of real time/ in minutes and 
seconds* required to run the same benchmark program (see 
Apenoix B ) with different amounts of core available. 



core (K words) 




3:15 3 : 30 3:45 <1:00 4:15 4: 30 4 : 45 (min.) 

Figure 1. Benchmark Real Time Vs. Core Size 

Implementing a scheduling algorithm that gave the 
interactive user faster response times and increased 
throughput was desired. Since a member of the Computer 
Science Group was interested in adaptive scheduling, thesis 
research was accomplished in this area and report on here. 
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II. BACKGROUND INFORMATION 



A. ADAPTIVE SCHEDULING - A SURVEY 
1 * General 

Scheduling algorithms can be placed into three 
basic classifications - round robin/ priority/ and dynamic 
or adaptive control , Algorithms based on round robin and 
priority assignment techniques have a common characteristic: 
the processor is switched from the process currently being 
serviced to a new process at the end of a fixed time quantum 
or when a new process is of higher priority. This switching 
usually has a significant overhead and reduces system 
utilization. 

When an operating system is getting close to 
saturation/ the round robin scheduling algorithm often fails 
to give an adequate response time to the time-sharing user 
116 ]. With a priority type algorithm/ processes are assigned 
priorities as they are entered into the system. The user 
supplies the information necessary for the operating system 
to assign a priority to the process. The information 
supplied can consist of an estimate of CPU time/ an estimate 
of the amount of core required/ and the number of 



input/output devices required. This information usually is 
an estimate of the maximum time or the maximum amount of 
primary memory required. 

Adaptive control solves the Problems of the round robin 
and priority scheduling algorithms by giving adequate turn- 
around times to all processes except those which are run in 
the background, that is, processes not in contention for 
immediate service. Several papers illustrating adaptive 
control scheduling techniques are discussed below. 

Northouse and Fu (131 develop a scheduling algorithm 
based on adaptive control and cluster i n g techniques. 
Bernstein and Sharp [2] and Sharp and Roberts [16] develop 
an algorithm based on the principle, '’don't do anything 
unless you have to." This avoids the system overhead of 
process-switching and swapping as much as possible. 

2. Adaptive-Control and Clustering Technique 

An adaptive controller can be referred to as a 
closed-loop or a feedback type system with the 
characteristics of the controlled process. Northouse and Fu 
1131 proposed their batch scheduler as an adaptive 
controller with t h r p e basic units: a classifier, a 
performance evaluator, and a distributor. 

The classifier made an M a priori" classification of all 
incoming jobs based on information supplied by the user on 



his job card, A clustering algorithm was used to establish 
clusters* The parameters used by the clustering algorithm 
were: 

1) CPU time used* 

2 ) number of tape drives. 

3) number of input cards. 

4) programming language. 

h) number of drum or disk files. 

6) number of output pages. 

Much effort went into the proper classification of 
clusters with the following final result: 

cluster I-medium CPU, I aroe tape file jobs 

cluster Il-laroe jobs 

cluster I I I - s m a 1 1 jobs 

cluster IV- medium CPU, small tape file jobs 

The performance evaluator monitored the system 
performance in specific areas and compared these evaluations 
to desired responses. Specific areas monitored included 
central processor utilization, printer traffic, drum and/or 
disk traffic, and tape drive utilization. If efficiency in a 
monitored area dropped below a minimum acceptance level a 
change in the job stream was made. 

A performance index was calculated and attempts were 
made to optimize this index for the next sub interval 
(variable lengths). The job stream was then determined 
using this index and a linear p r o o r a m m i n g technique. The 
distributor implemented the job stream that was calculated 
by the performance evaluator. 

As jobs were executed, their statistics were used by 



two more components called the data collector and the data 



base updater. The updater made the system a closed loop 
system by continually updatinq the data base from which the 
linear programming function made its decisions. 

Northouse and F u , after running two simulations on 
their scheduler? concluded: 

1) The scheduler was able to adapt to changing 
work loads. 



2 .) The job stream had definite clusters. 

3) The orogrammino language was an important 
parameter for classifying clusters. 

4) The distributor reauired few calculations and 
was easily updated. 



5) Selected clusters could be forced through the 
system reducing their turnaround time. 

6) Using a orooer selection policy had a 
significant impact on decreasing turnaround time. 



3. Adaptive Policy Driven Scheduler 

A M Policy-Driven Scheduler" attempts to deliver 
computational resources at a rote? determined by some tyoe of 
criterion or policy function. Berstein and Sharp (PI define 
their policy functions in terms of "resource units" and "age 
of interaction." A r^i interaction consists of: a request from 
a user to the system, some system service, and a renly from 



the system back to the user. An edit command on a time- 
sharing system is an example of an interaction. The 
execution of a batch job is another example. 

The age of the interaction at some time "t" is the 
elapsed time from when the user made the request to the 
system until time "t". Resource units are a measure of 
service received by a particular interaction. Irn is an 
arbitrary non-negative time cost for the i-th resource and 
Rij(t) is the number of units of the i-th resource used by 
the j-th interaction at aoe " t M , The total resource units 
of the j-th interaction at aqe H t\ Rj(t)f is equal to the 
sum of all the i times Rij(t), 



R 




Mgure ?• Resource Count and Policy function vs Aoe. 



The resource count is a nondecreasing function of time. 
Figure 2 shows the resource count function for the j-th 
interaction up to age tl. A slope of zero indicates periods 
of no service while a oositive slope indicates periods of 
resource consumption. The policy function, F(t), is shown as 
a curve . 

The goal of the scheduling alaorithm is to keeo the 
terminal point of each interaction above the policy 
function. The total amount of resources required to complete 
an interaction is not usually known in advance. Thus, the 
algorithm tries to maintain Rij(t) greater than or equal to 
FCt ) . 

An interaction is critical at time M t M if Rj(t) is less 
than F(t)# Interactions are ordered according to a measure 
called " critical time." Critical time is defined as: 

tO + t c - t 1 , 

where tO is the current time and tl is the current age of 
the interaction. tc is the last age of the interaction at 
which time it went critical. Figure 2 shows an interaction 
of age tl which became critical at age tc and which is still 
critical . 

Critical time changes only when service is receiver! and 
thus only needs to be undated at that time. lhis property 
insures that the queues of processes ordered by this 
quantity remain ordered as time progresses. After a process 
receives service it must bo relocated in the queues. 



The scheduler has two queues. The "core queue" is a 

linked list of processes in main memory and the "drum queue" 

* 

is a linked list of processes not in main memory. Both 
queues are ordered by critical t i rr e . Processes from the 
head of the "drum queue" are transferred into main memory if 
room is available. If room is not available for a process, 
a swapping decision is made based on a comparison of the 
critical times of the first process on the drum queue and 
the last process on the core queue. 

This scheduler reduces overhead caused by unnecessary 
swapping by prohibiting the replacement of a process in core 
by a noncritical process which is not in core. The rules 
that make up the swapping decision are: 

1) A critical Process in core is not eligible to be 
swapped out (designed to prevent thrashing)* 

2 ) Processes which are inactive because they are 
awaiting communication from a terminal are given a 
critical time of t(e). 



3 ) A noncritical process which is not in main memory 
will be swapped in if room exists or if room can be 
created by swapping out Processes with a critical time 
of t ( e ) . 



4) A critical process which is not in main memory will 
displace a noncritical process which is in main memory. 



Bernstein and Sharp were unable to detect t hrashi no using 
these constraints. 



Ihe most critical process which is in main memory and 



ready to execute is aiven the processor. The period of use 
is one time quantum or until the process voluntarily 
relenqui shes it/ which ever occurs first. The processor is 
again dispatched after a swapping decision is considered. 

The policy function controls the service received by 
each interaction. Static policy functions must be set 
conservatively to avoid response problems during heavy 
loads/ however, it was found that during light load periods 
these conservative settings resulted in a wide range of 
service rates to similar jobs. Sharp and Roberts [163 found 
that varying the policy functions as the job load changed 
greatly reduced the service variance. 

Bernstein and Sharp showed that their policy driven 
scheduler was far better than a round robin scheduler in 
terms of "internal response time" measured in seconds: 



SCHEDULER 


MINIMUM 


MEDIAN 


MAXIMUM 


Round Robin 


a. 5 


10.8 


102 . a 


Policy-Driven 


0.5 


1.7 


3.8 



Table I. Round Robin Vs Policy-Driven Scheduler 



Sharp and Roberts demonstrated that 
driven scheduler was far better 



their adaptive policy 
than the static policy 



driven scheduler with the following 



results: 



SCHEDULER 



RESPONSE 

TIME 



CPU 

UTIL I 7 AT I ON 



DISC 

UTILIZATION 



N on-Adaptive 5.1 sec. 

Adaptive 1.7 sec. 



61 % 

66 % 




Table IT. Non-Adaptive Vs Adaptive Scheduler 

Note the differences in the magnitude of the response times 
in both comparisons. 

An adaptive policy driven scheduler as it pertains to 
the MU NIX operating system for the PDP-11/50 will be 
discussed in detail in Chapter III. 

B. LABORATORY EQUIPMENT CONFIGURATION 
1 . General 

Although MUNIX is a multiprocessor operating 
system^ all testing was done with only one processor active. 
This was done to control testing. Documentation of the 
laboratory eauipment configuration is necessary because test 
results depend on which of the two systems is used. Figure 
3 shows the laboratory eauipment and configuration. During 
the design and implementation of the adaptive scheduler/ the 
operational equipment consisted of two POP 11/50 CPU's 



(labeled A 



and B) with the following eauipment: 



2 . System A 



32K MGS memory (450 nsec* access time) 

16K core memory (750 nsec* access time) 

16K CSPI memory (GOO nsec* access time) 

1 DEC LA30-C terminal 

1 disk cart ridqe ( PKOS equivalent ) 

1 Versatec printer/plotter 
1 paper tape reader/ punch 

1 Vector General 3D3I vector display terminal 
1 Ramtek raster scan color display unit 
1 Tektronix 4 0 1 4 displav terminal 
1 Hughes Conograph i c console 
1 Data Tablet 
1 E P C graphic recorder 



3 ♦ System B 



96K CMI core (850 nsec, access time) 

16K CSPI core (900 nsec* access time) 

1 DEC LA30-C terminal 

2 DFCpack disk cartridges (RK05) 

1 DEC D H 1 1 - A C communications multiplexor connected 
to (up to 16) remote terminals 
1 card reader (600 cards/mi n) 

1 impact printer (400 lines /min) 

2 nine track magnetic tape drives 
1 seven track magnetic tape drive 



4. System A and B Differences 



The most important difference between tdie two 
systems is the amount of memory available. System 8 has more 
memory than system A and therefore can have more processes 
in resident core. This has a significant effect on any 
benchmark testing ( see section II. U „ ) . In addition, the 

must also be considered. 






speed of the memory 




? <, 



* CoTponent. ^ A c t i v e 0 urinn Benchmark I ost i nq 
H qurp i - L a h o r a t- o r y f auinment Cant inure t i on 



C. SCHEDULING WITH MUNI X ON THE POP-1 1/SO 



MUN1X [7] is a multiprocessing version of UNIX 115], a 
general purpose, multiuser, interactive operating system 
usable on the Digital Ecu i omen t Corporation PDP-11/40, 
PDP-11/45, and PDP-11/50 computers, UNIX was developed by 
Bell Laboratories. 

In order to understand the MU NIX scheduling algorithm 
and implement a new one it was necessary to learn C, a high 
level programming language. Several references were helpful 
in this endeavor [1,10,181. 

Although parts of the MUNIX operating system have been 
documented [7,121, Appendix A will attempt to completely 
document the portions related to scheduling. 

D. BENCHMARK TESTING 

Benchmark testing is often used to evaluate and compare 
the performance of one computer system relative to another. 
A benchmark program was written to test scheduling 
algorithms. It was necessary to use processes that 
accomplished input, output, computations, and compilations. 
A discussion of the benchmark program used may he found in 



Appendix B . 



E. N ON-ADAPTIVE SCHEDULING CHANGES 



1 . General 

After studying the scheduling alqorithm used by 
MUNIX, it was decided to make some non-adaptive changes 
before implementing an adaptive scheduler. Three areas were 
modified to make the algorithm more efficient and have a 
better basis to start performance evaluations on the 
adaptive scheduler. The changes are documented in detail in 
Appendix C and summarized here. 

2 • Maximum Number of Processes ( N P R 0 C ) 
a . Change 

NPROC was a static upper limit for the number 
of processes in the process table. The static upper limit 
was replaced by a dynamic one* thus saving process table 
search time. 



b. Evaluation 

The benchmark nrooram (see Appendix B) was 
run against the scheduling algorithm no fore and after this 
change. Pour runs were made/ two with a drum being used for 



/ T M P files (temporary files), and two without. The results 
are listed in tables III and IV. Real, user, and system 
times are shown in minutes and seconds. Appendix £3 explains 
how the system calculates these times and estimates their 
accuracy. This testing was accomplished on the "B M system 
(see section II. B. 3.). 



BEFORE CHANGES 

real 6:02 
user 2:12 
sy s : 42 

Table III. NPROC Change 



AFTER CHANGES 

real 6:00 
user 2:00 
sy s : 4 1 



Evaluation with No Drum 



Before Changes 

real 4:49 
user 1:68 
sy s : 4 1 



After Changes 

real 4:38 
user 1:48 
sy s : 42 



Table IV. NPROC Change Evaluation with Drum 



3. Looping in Function Sched 



a . C h a n g e 



As described in section £3.1.1) and B . 1 . c . ( 2 ) 

l n 



of Appendix A, two loops 



sched were shortened so no 



unnecessary code was executed. 



b. Evaluation 



The benchmark program (see appendix B) was 
run before and after the changes. Several runs were made 
because the statistics showed no significant savings (see 
Table V), Testing was accomplished on the H A M system. 



Before Changes 



After Changes 



real 7:08 
user :4b. 6 

sy s : 1 8 . 0 



real 7:08 
user : 4 6 . 6 

sy s : 1 7 . 8 



Table V. Looping Change Evaluation 



4, Size Check 



a. Change 

Function sched was changed to make an 
additional check before swapping a process out of core. The 
size of the incoming process had to be smaller than or equal 
to the size of the outgoing process. If the incoming 
process is larger than all processes elidible for swapping, 
no size check is made. This task is accomplished using two 
passes. See Appendix C for a detailed explanation. 
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b. Eval uation 



Several runs were made with the benchmark 
proqram on the " A " system because of the 26 percent savinas 



realized in real 


time (see Table VI). This change was also 


tested on the 


'* 8 " system, but a savings of only 6 percent 


was found there. 


The difference in savings is explained by 


the significant 


difference in available memory (the M B " 


system has three 


times as much user space) for each system. 



BEFORE CHANGES 


AFTER CHANGES 


real 7:0 b 
user : 06 . 6 

sy s : 1 5 . 0 


real 5:18 
user : 45 . 3 

s y s : 1 7 . 9 



Table VI, Size Check Change Evaluation 
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III. ADAPTIVE SCHEDULER 



A. DESIGN 



The adaptive scheduler described in section 1 1 . A . 3 
was implemented with a few minor changes, 

1 . Goals 

There were two goals that the adaptive 
scheduler attempted meet. 

a. Improve system throughput by reducing the amount of 
process sweeping (in and out of core). 

b. Give the interactive user better response time. Ihe 
MUNIX scheduler basically gave users a round robin type of 
service. 



2 . Design Chances 



Share and 



criticallitv of 


a 


p r oc e s s 


a s 


last went c r i t i 


c a 1 


unt i 1 


the 



Roberts llo] nr. e a s u r e o the 

the time from w h p r e the process 
current time (see figure 2 ) - 



The implemented scheduler measures the critical time as the 



vertical distance from the policy function to the resource 



count . This design change 
efficient calculation of pri 
calculated priorities on a 
current scheme priorities are 
needed. There are two reasons 



was made to facilitate a more 
orities. Sharp and Roberts 
fixed period basis. In the 
calculated whenever they are 
for this change: 



1) The policy function is change d whenever the job 
(process) load reaches predetermined amounts. When the 
function is changed? all jobs? both in and out of core? have 
to have their priorities recalculated with the new policy 
function. 



2 ) Depending on the fixed time oeriod? jobs may receive 
an excessive or insufficient amount of resources. 

By recalculating the priorities on a continuing basis/ 
no special software is needed to recalculate the priorities 
after policy functions have been changed. Also? everytime a 
scheduling decision is made? it is made with the latest 
priorities of all the jobs concerned. 



B. IMPLEMENTATION 



1. Process Table Control Variables 

a . pet i me ~ was changed f roni a character 
variable (maximum value of 1 2 7 ) to an integer v a r i a 1.) 1 e 



(maximum value of 32767)* The use of the variable was also 
changed (see Appendix A section A.l.f). Currently it is used 
as a counter for the total number of seconds since a process 
(job) has last had any teletype input. It is also used to 
calculate the priority of the process. 

b. peflaq - was changed from a character 
variable to an integer variable because two additional bits 
were needed as special indicators. 

1) PSTM - (value of 400 octal) when set 
means that the process has received a minimum amount of 
resources in a minimum amount of time and from this point on 
will be run strictly as a background process. 

2) TPWAIT - (value of 1000 octal) when 
set means that this process is waiting for terminal input 
and will not be scheduled to run again until terminal input 
has been made . 



c. p<-resr - was added as an integer variable. 
It is used to keep track of the amount of resources received 
by the individual process. The U-vector (see Apoendi x A 
section A *2) of each process has two variables/ u«-utime and 
u s t i m e / that contain the same information. These variables 
could not be used for two reasons: 

1) The U-vector is in core only when the 
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process is core resident. Priorities must be calculated 
both when the process is in and out of core. 

d) Because of the inter-dependency of 
processes [151 (paqe 370), the parent (or grandparent) 
process accumulates the resource units (u f ut ime and u*-stime) 
of its chi Idem Cor grandch i 1 dern) , 

This new variable only accumulates resource units for the 
process it is related to. p^resr is incremented in program 
clock. c in the same places as u^utime and u f stime. 

d . Priority Calculations 

Subroutine "schpri.c", schedule priority/ was 
written to calculate process priorities (see Appendix D). 
There were two possible areas of the operating system that 
the priorities could be calculated. 

a. Subroutine s w t c h , in program slp.C/ is 
entered each time the operating system changes (switches) 
from one process to another. 

b. Subroutine sched/ also in program slp.Cr 
is entered each time a process is being considered tor 
swrinpi n q . 



A test was run to examine the number of times the two 



subroutines are executed relative to each other. It was 
found that swtch is executed approx imatel y thirteen times as 
often as sched. Sched also decides which process should be 
core resident while switch decides which process should be 
executed next. As a result of these two observations, sched 
was chosen as the place to calculate priorities. 

3 • Scheduling Algorithm 

Subroutine sched (see Appendix A section 6.1) 
is described with adaptive chances. This subroutine is a 
process with an infinite loon initiated from function main. 
Schea is executeo, out to sleep, awakened, end executed 
again. Sched 1 s main, job is to swap processes in and out of 
core. It accomplishes this task using the following 
algorithm: 

a. Set the first/second pass indicator 
(fspass) to a value of first pass. 

b. If there are any processes in the swan 
file, find the one that is the most critical and try to 
transfer it into core. If the swap file is empty, then ao 
to sleep on the RUNOUT flag which indicates there are no 
processes in the swan file. When awakened ao to " b . " and 
continue. Sched may try up to a maximum of three different 



ways to bring this process into core 



c. If room exists in core then transfer the 
process into core. 

d . If room does not exists in cere then sched 
looks for what it calls "easy core" - Easy core is core that 
belongs to a process that is not a system process; not 
locked/ and waiting for some type of input or output. An 
additional constraint that applies only on the first pass is 
that the size of core needed for the incoming process is 
less than or equal to the size of core of the outgoing 
process. If easy core is found/ then the outgoing process is 
swapped out and sched goes to "c." to continue. Sched 
repeats this until either the process can be transferred in r 
or no easy core is available. 

e. If no easy core is available/ then sched 
searches all the processes that are in core for the one that 
has the lowest priority and meets the following constraints: 
the process must not be a system process/ not locked/ 
sleeping/ and not currently being run on the other 
processor. Two additional constraints/ that are made on the 
first pass only/ are the size check mentioned in " d . " above 
and a check that the process can be ready to run instead of 
sleeping. At this point there are two possible states to 
consider. 

1 ) The lowest priority process eligible 
to be swapped out is critical. If this is the first pass/ 
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change the first/second pass indicator to second pass and go 
to "d." * or if this is the second pass and no processes are 
eligible to be transferred out > go to sleep on the RUN IN 
flag (processes are in the swan file). When awakened go to 
"a." and continue. 

2) If the lowest priority process 
eligible to be swapped out is not critical or this is the 
second pass and a process meets the requirements in "e," 
above* then swap the indicated process out of core and go to 
"c." and continue. 

Two additional versions of the above algorithm were 
tested* and will be discussed in section IV. A. 
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IV. CONCLUSIONS AND RECOMMENDATIONS 



A. CONCLUSIONS 

1. Critical Processes 
a. Change 

Sharp and Roberts' algorithm til] did 
not allow any process which was critical to be swapped out 
of core. That same constraint was implemented by changing 
B . 3 . e . 1 and B . 3 „ e . 2 in section III as follows: 

B. 3.e.l The lowest priority process eligible to be 
swapped out is critical. If this is the first pass, change 
the first/second pass indicator to second pass and go to 
" d . n , or if this is the second pass, qo to sleep on the 
RUNIN flap and when awakened go to "a." to continue. 

B . 3 . e . 2 If the lowest priority process eligible to be 
swapped out is not critical/ then swap the process out and 
continue with " c . " above. 

This change was implemented by only searching for processes 
with a priority greater than ?ero. 
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b . Evaluation 



Because of the inter-relationship 
between processes^ this change was able to lock all of core 
and put the operating system into a deadlock condition. For 
example/ consider the case where both the parent and the 
child processes are critical and the parent is waiting for 
the child's termination. The child cannot terminate because 

it is not in core and cannot oet into core because the 
parent is critical (and cannot be swapped out). 

2 • Non-Critical Processes 
a • Change 

It was thought/ that by not trying to 
transfer a non-cri t ical process into core (by swapping 
another out)/ swapping time could be saved. This change was 
implemented by adding the following after III.B.3.e. above. 

If the incoming process is not critical then go to 
sleep on the RUN IN flag, dhen awakened go to " a . " above to 
cont i nue. 

b. Evaluation 

Although this change did not cause a 
deadlock/ it did create an unsatisfactory result. Example: 
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The parent is out of core and has two chiloern in core. The 
first child is a compute bound job and has received a lot of 
resource units- The parent becomes critical and forces the 
non-critical child (compute bound) out. The second child 
dies# the parent is now waiting on the first child, but he 
cannot get back into core until he is critical. So the 
computer has nothing to do until the child is able to get 
back into core by forcing the parent out and back into a new 
1 oc a t ion. 



times on the A processor with the results listed in Table 

VII . 



3. Implemented Algorithm 



a . Change 



All the changes are explained in section 



III.B.3 



b. Evaluation 



The benchmark program was run several 



BEFORE CHANGES 



AFTER CHANGES 



real b : 1 8 

us.e.r; . 3 

sy's * :"17;9 



real 8:33 



t 




. u;s.e r. ; yflbd 



T a b 1 e V I 1 . Implemented Algorithm Evaluation 
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It is obvious that the real time is slower. This can be 



explained in part by the amount of time spent swapping 
processes. The MU NIX scheduler solves this problem by 
requiring that a process stay in core at least three seconds 
and once out/ stay out at least two seconds. A similar 
effect could be accomplished with the adaptive scheduler by 
changing the process' p 1 i m e • Ihis was decided against 
because the adaptive scheduler would then be a modified (and 
pro b ably less efficient) MU NIX scheduler, 

4 • Goals 

The goals of this thesis were not met by the 
adaptive scheduler. However/ they were met in part (non- 
adaptive scheduling changes - section II, E,) by the research 
accomplished while implementing the scheduler, 

a. System thru-put was improved by reducing 
the amount of process swapping (see section II, E, 4,). 

b. The interactive user is not given a better 
response time/ but all users are. This was accomplished by 
improving the efficiency of the current scheduling 
algori thm. 
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8. RECOMMENDATIONS 



1. Adaptive Control 



a. MUNIX 



The adaptive control scheduling 
algorithm concept / when applied to MUNIX or UNIX will need 
more careful consideration, UNIX uses a hierarchical process 
structure which creates process inter* dependency problems. 
An example of the problem can be found in the H time s h 
benchmark” command sequence (Appendix B). The " time" command 
is the Parent to the "sh” command/ which is the parent to 
the " benchmark” command file. The "benchmark" command file 
will go two additional generations lower in all "C" compile 
commands. It is entirely possible to be eight or nine 
generations deep without executing any involved command 
sequences. Thus / it is a frequent occurrence that the 
currently active child will have numerous (intentionally) 
waiting ancestors which have no computational requirement 
until the child terminates. The failure of the present 
adapt i ve control effort seems to stem largely from the fact 
that each process was not considered on its own merit. 

d hen a parent is waiting for a child to terminate/ the 
parent should not be in competition for resource units with 
the child. There are two Possible solutions to this problem: 
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1) Any process that is in the wait state 
should not have its "p<-t i me" increased. This would not allow 
a process to change priority while it is waiting on another 
process. Ihis change would require a "status" check where 
"p<-t i me" is incremented. 

2 ) If the parent process is waiting on a 
child/ set a status f 1 a o so that the parent will not be put 
in contention with its non-term inated child. This change 
would be considered the general case/ but it would require 
more software changes. 

b. Other Operating Systems 

Implementing an adaptive control 
scheduling algorithm with a minimal hierarchical structure 
seems to be straight forward. Sharp and Roberts [16J 
reported no serious Problems with their implementation. 

Additional Research 

Additional research should be undertaken to 
analyze process interactions in UNIX. To do this/ a 
comprehensive systems instrumentation package must be 
developed. In particular/ a better timimq mechanism, a 
"complete" resource utilization accounting system, and a 



selective event tracing capability are needed. In light of 



the level 
add i tional 
warranted. 



of improvement Sham and Roberts [16] 
research with adaptive control and MUNIX 
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APPENDIX A: PROCESSES AND SCHEDULING 
A # PROCESS INFORMATION 

Any time the word M process" is used in this 
appendix, the word “interact ion" from section II may be 
subst i tuted. 

1* Process Table ( oroc.h ) 

This table contains the control information 
for process scheduling. Currently, the table may contain up 
to fifty active processes, each occupying a process block 
with thirteen data elements describing it. A process is 
assigned a process block when it is created and relinquishes 
it when it is deactivated. The elements and their meanings 
are: 



a. p*-stat - a process scheduling status with the 
following possible states: 



(1) SSLEEP - this process has been put to sleep 
(not available to run). 



(, P. ) 5 v'/ A I T - this process is waiting for some type 
of innut /out put com p let ion. 



(3) .SPUN 



this process is ready to run. 



SIDL - this process is active but not in any 
other status. 



(5) SZOMB - this process has terminated but 
information in the process control block is 
required for other uses. 



b. p*-flaa - process status of the memory manager with 
the following possible states: 



(1) SLOAD - this process is loaded in main memory. 

( 2 ) SSYS - this process is a system process* 



(3) SLOCK - this process should not be swapped out 
of main memory and is therefore locked in. 



(4) SSWAP - this process is being swapped out of 
core. 



(5) SMDF - this process must run on the first CB) 
proc essor. 



(6) SMDS - this processmust run on the second (A) 
processor. 

(7) S A l\i Y P - flaq used for processor masking. 

(8) SBRKPT - system break point# used as a debug 
tool. 



(9) S GOING - this process is currently being run 
by some processor. 



c. pepri - process priority. The priorities are whole 
numbers ano range from -128 (highest ) to 127 (lowest). 



d . p «- s i q 



process signal indicator 



e • p u i d - the unique id assigned to the user that 
created this process. 



f. p<-t ime - the total time in seconds that this process 
has been in or out of core. 

g. p«-ttyp - the id of the terminal associated with this 
process . 

h * p<-pid - the unioue id assioned to this process. 

i. pepoid - the unique id assigned to the parent of 
this process. 

j . p a d d r - the address (memory or disk) of the first 
word of the process' " u vector" (described below). 



k. p^size - the amount of non*shareabl e core this 
process needs . 



1 . p<-wchan « holds a number that can be a channel 
address or a special indicator. The process is put to 
sleep or suspended with this number and it can only be 
awakened or restarted using the same number. 



m. *p<-text - pointer to the shareable portion of a 
process, if it exists. 



<? . U Vector ( user.h ) 

The system associates \ 02H bytes of storaoe 
with each user process, called the "u vector". This storage 
contains system per-process data and the system stack for 
this Process. An important difference between usor.h a n d 



p roc ♦ h is 



proc.h is always core resident while user .h is 



core resident only when the process it is associated with is 
core resident* The MUNIX scheduling algorithm does not use 
any elements from the u vector to make decisions* 

B. SYSTEM FUNCTIONS PERTINENT TO SCHEDULING 

The scheduling flow and basic system flow are 
shown in Figure 4 [71. 

1 • sc hed 

This function is a process with an infinite 
loop initiated from function main. Sched is executed, put to 
sleep (see function sleep below), awakened (see function 
wakeup below), and executed again. Sched’s main job is to 
swap processes in and out of core. It accomplishes this 
task using the following algorithm: If there are anv 

processes in the swao file (out of core), find the one that 
has been there the longest and try to transfer it into core. 
If the swap file is empty, then go to sleep on the RUNOUT 
flag which indicates there are no processes on the swap 
file. When the longest waiting process has been found, sched 
up to three different ways to transfer it into core. 



'4 6 



tries 




Figure 4 . Scheduling Flow 



process i n . 



a. If room exists in core then transfer the 



b. If room does not exist in core then sched 
looks for what it calls "easy core". Easy core is core that 
belongs to a process that is not a system process/ not 
locked (eligible for transfer out)/ and waiting for some 
type of input or output. If easy core is found/ then that 
process is transferred out and sched starts over by lookino 
for the process that has been in the swap file the 1 onoest 
(this will be the same process that it found before)/ and 
continues with "a" ah ovn. Sched repeats this until either 
the process can be transferred in or no easy core is 



a v a i 1 a b 1 e . 



c. If no easy core is available then sched 
makes two more checks to insure that the process is 
deserving enough to require another process to be 

transferred out* 



(1) If the process has not been out of 
core for more than two seconds, sched qoes to sleep on the 
RUN IN flag which indicates there is at least one process on 
the swap file. 

(?) Find the process (it must be 
sleeping or ready to run, but not running) that has been in 
core the longest. If that process has not been in core at 
least two seconds then sched goes to sleep on the RUNIN 
flag. If it has been in more than two seconds, transfer it 
out and start over by looking for the process that has been 
out of core the longest and continuing with "a" above. 

? . s w t c h 

This function is invoked several places 
throuahout MUNIX to accomplish the task of rescheduling the 
CPUs. Swtch searches the process table for the highest 
priority process that is in core and ready to run on the 
requesting CPU. If a process is found it is aiven the CPU, 
otherwise the CPU is out in on idle state. It stays in an 



idle state until started a q a i n by an interrupt 



3 . sleep 



This function is also invoked several places 
throughout MU NIX. It will change the process' status from 
ready to sleeping or waiting depending on the value of pri. 
If pri is less than zero? a signal cannot disturb the sleep/ 
and the status is changed to SSLEEP. If pri is not less 
than zero/ the status is chanaed to S W A I T / and the process 
may be disturbed by signals. Chan is an integer that 
represents the reason the process has been placed in a wait 
state (Sin AIT or SSLEEP). After the process has been put in 
a wait state/ sleep calls swtch to find another process to 
r un . 

4 „ wa keup 

This function changes the status of all 
processes that have been out to sleep on chan from the wait 
state/ to the SPUN state (ready to run). If any processes 
awakened are on the swao file and sched is sleeping on the 
RUNOUT flag/ it is also awakened. When swtch is next called/ 
sched will be scheduled to run (sched is the highest 
priority system process)/ and an attempt will be made to 

the processes just awakened by wakoun. 
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find core for all 



APPENDIX B: SYSTEM BENCHMARK 



A. GENERAL 

This appendix contains information concerning the 
benchmark program used for test i nq and evaluating scheduling 
changes in this thesis. 

B. INDIVIDUAL PROCESSES 



limes for all the individual processes can be 
found in Table VIII. The times used in Table VIII are from 
the " A M system (see section II. B. 2). It is significant to 
note that the times on the 11 B" system are faster because 
there is approximately three times as much “user core" 
available. The benchmark consists of a series of eight 
processes (discussed below) executed from a command file. 

1 . chdi r /us r/sys 

Change the current working directory to the 
new one specified# in this case the new working di rectory is 
/usr/sys. Ihis command does not create a new process# but 
is directly executed in the shell. 

2, sh Ida 



SO 



Execute the command file Id, File Id loads a 
new operating system and places it in a file named "a. out". 



3. 


Chdir c o n f 

See 1. above. 


« . 


CC -c conf ,cK 

Compile without loading the C program named 



" conf.c" t this M program" consists of oata statements^ 
initialization# and no executable code* The compiled object 



code goes i n 


a file named M conf.o" • 


5. 


Chdir /usr/ bench 
See 1 . above * 


6. 


cc - 0 rftest.c& 

Compile the C proqram ”rf test .c M usino the 


experimental 


object-code optimizer. The optimized object- 


code goes in 


a file named "a.out " . 


7. 


bas tower < t o w e r i n >/dev/null& 

This is a compute bound process that has an 


input file 


named “towerin' 1 and an output file named 


"/dev/nul 1 “ • 


The output file is a null device. Tower is an 


interpretive 


execution of a recursive solution to the towers 



of fireman (Hanoi) problem which represents tokens as double 
precision floating point numbers. Solution is for thirteen 



disks ana three towers 



b . chdir /usr/sys/dmr 



See 1 . above. 

9 . cc -c -0 tm.cS 

Compile the C proqram named "tm.c" without 
loading it and with the experimental object-code optimizer. 
The resulting object-code goes in a file named "tm.o". 

10, - CP /muni x /dev/nul 1 & 

Copy the 34,800 byte file named "/muni x" to 
the file named "/dev/nul 1 " . 

11, chdir /usr/sys 

See 1. above. 

12, sum /usr/sys/libl >/dev/nu11& 

Compute the checksum of the 60,390 byte file 
named M /usr/sys/libl M and output the number to the file 
named "/dev/nul 1 " . 

13, sum /usr/sys/lib2 > / d e v / n u 1 1 & 

See 12. above • 

1 4 , wait 

Wait until all processes started with " 
have completed* and report any abnormal terminations. There 
is no measurable time associated with this command if it 



stands alone 



All the times used in Table VIII have come from the 



time command of UNIX [18]. Execution times (user and system) 
are determined by sampling the state of the system at a 60 
hz rate (1/60 second). A counter is keot for each type of 
time. Note that "nm" means not measurable. It is 
significant to note that the execution time can deoeno on 
what kind of memory the process happens to occupy. The user 
time in K 0 S is approximately half of what it is in core 
[18]. This problem has been solved by running the benchmark 
program as a single user. This forces the same processes 
into the same type of core. The elapsed time (real) is 
accurate to the second/ while the CPU times (user and 
system) are measured to the 60th of a second. It was found 
that the system times. may vary by as much as 8.5 per cent/ 
and the real time by as much as 8.3 per cent. 



rocess 


real 


user 


s y s 


1 


nm 


nm 


nm 


2 


40 


4.7 


3.9 


3 


nm 


n m 


nm 


24 


21 


1.0 


1 .9 


b 


n m 


nm 


nm 


6 


37 


4.2 


4.0 


7 


3 4 


30.8 


0.7 


8 


n m 


nm 


nm 


9 


4 1 


10.9 


3.4 


1 0 


5 


0 . 0 


0 . 6 


1 1 


nm 


nm 


n m 


12 


6 


0.3 


0.7 


1 3 


b 


0.3 


0.6 


1 4 


n m 


nm 


nm 


S u rri (min) 


3: 09 


b2 . 2 


lb. 8 



Table VIII. Individual Process Times 
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c. benchmark program 



processes 
mix. Time 



Table IX. 
The 



lhe benchmark proaram consists of all the 
mentioned in section B. above in a mu 1 t i programmed 
for the multi oroqrammed benchmark is: 

real 7:08 
user 46.6 
sys 18.0 

benchmark Program Evaluation 

command sequence "time sh benchmark" is used to 



initiate processing of the benchmark 



APPENDIX C: NON- AD APT I VE SCHEDULING CHANGES 



A. GENERAL 

This apoenoi x contains detailed information 
concerning the non-adapt i ve changes made to the MU NIX (based 
on UNIX Version 5) schedulino alaorithm. 

B. MAXIMUM NUMBER OF PROCESSES (NPROC) 

1 .Change 

NPROC was a constant defined in param.h to be 
fifty. Ihat means there can be no more than fifty processes 
in the system at any one time. Twelve system functions used 
NPROC for searching the process table. For example; if 
function swtch was looking for the hiahest priority process/ 
it looked at all entries in the process table. Normally this 
would not be considered wasteful/ but the orocess table is 
very seldom/ if ever/ completely full. This means there is 
time being wasted if the orocess table does not hold fifty 
processes. Since processes are enter e d into the process 
table at the first available space/ a counter could be used 



to hold the maximum number of processes in the process 



table. Time could be saved by searching the process table 
from the beginning to the counter. This change was made as 
follows: 



1. A new integer variable/ nproc (lower case)/ was 
placed in proc.h to keep track of the last process in the 
process table. The twelve functions that used NPROC now use 
nproc • 



2. Two lines of code have been added to function 
newproc (in program slp.c). The code insures that nproc is 
incremented when necessary. 

3. Two lines of code have been added to function 
wait (in program sysl.c) to insure that when the last 
process in the process table terminates/ nproc is 
decremented. It is not sufficent to decrement nproc by one 
in all cases. Example: The process table could have the 
first nine process blocks allocated/ a new process enters 
and takes block ten/ process eioht and nine terminate/ then 
process ten terminates. If nproc was decremented by one/ 
then all searches would look at blocks eight and nine 
unnecessari 1 y. 

2 . Evaluation 

The benchmark program (sec Appendix b) was 
run against the scheduling algorithm before and after this 



change. Four runs were made, two with a drum being usea for 
/IMP files (temp files)/ and two without. The results are 



listed in tables III and IV. Real/ user/ and system times 
are shown in minutes and seconds. Appendix B explains how 
the system calculates these times and estimates their 
accuracy. This testing was accomplished on the n B M system 
(see section II* B . 3 . ) . 
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BEFORE CHANGES 

real 6:02 
user 2:12 
sy s : 42 



AFTER CHANGES 

real 6:00 
user 2:00 
s y s : 4 1 



Table III. NPROC Change 



Evaluation with No Drum 



Before Changes 

real 4:4 9 
user 1:68 
sy s : 4 1 



After C hanges 

real 4:38 
user 1:48 
s y s : 4 2 



Table IV. NPROC Change Evaluation with Drum 



C. LOOPING IN FUNCTION SCHED 



1 . Chance 



As describee in section B.I.b and B.l.c.(2) 

to a point that is 



of Appendix A , s c h e d unnecessarily loons 



repetitive. The two loops were changed as follows: 

1. A label, M finds p " (find space for process), was 
inserted where sched starts looking for core for the process 



it just found (the 


process 


that has been out 


of core 


the 


1 ongest ) . 


l\hen 


easy 


core has been found, 


instead 


o f 


branchi ng 


back to 


look 


for 


the process that has 


been out 


o f 



core the longest, sched branches to findsp. 

2, A pointer, " p2 M , was added to the declarations 
of sched. The pointer p 2 was substituted for pi in the first 
two instances after no easy core is found. This leaves pi 
pointing to the process that has been out of core the 
longest, giving no neeo to search for that, process again, 

2 . Eva 1 uat i on 





The benchmark 


program 


(see append i x B ) 


was 


run before 


and after the 


changes. 


Several 


runs were 


made 


because the 


statistics showed 


no significant 


savings 


( see 



Table V). Testing was accomplished on the " A 11 system* 



Before Changes 



After Changes 



real 7 : U 8 
user : 46 , 6 

s y s : 1 6 , 0 



real 
user 
s y s 
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4 6,6 

17,8 



Table V. Looping Change Evaluation 



D. SIZE CHECK 



1 * Change 

In two separate places/- function sched 
swapped a process out of core without giving any 
consideration as to whether that action created enough room 
for the incominq process. Many times two or three processes 
were swapped out of core when onlv one was necessary. This 
creates a large overhead in swapping. A two pass check was 
installed to circumvent this problem. 

1. First pass - Check to see that the process 
being swapped out of core is as large or larger than the one 
being swapped in, if not, do not Swap it out. 

d . Second pass - If the first pass fails to create 
enough room for the incomina process/ swap eligible 
processes out until enough room exits. 

This change was implemented using a first/second pass 
indicator (f spass) having the value of zero for the first 
pass and one for the second pass. This indicator was "or" eo 
with the size check thereby using the same code for both 
passes. 



2 , E va 1 ua t i on 



Several runs were made with the benchmark 
program on the " A 11 system because of the 26 oercent savings 
realized in real time (see Table VI). This change was also 
tested on the "B" system, but a savings of only 6 percent 
was found there. The difference in savings is explained by 
the significant difference in available memory for each 
system. 



BEFORE CHANGES 

real 7:0b 
user : 46 . 6 
sy s : 1 8 . 0 



AFTER CHANGES 

real 5:18 
user : 4 5 . 3 

sy s : 1 7 . 9 



Table VI 



Size Check Change Evaluation 



APPENDIX D: PRIORITY CALCULATION SUBROUTINE 



/ * 

* This subroutine was written by Ronald E . Joy and 

* used for an adaptive scheduler in November of 1 9 7 S . 
*/ 



include " . • / p a r a m • h 11 
include ".,/proc.h" 



/ * 

* If the following variables are needed in any 

* other program^ include schnri .h 
*/ 



i 

i 

i 

i 



i 

i 

i 

i 

i 

i 



n t 


t c hg 1 


a; 


// 








// 


n t 


s 1 OP 1 


2 ; 


// 








// 


n t 


s 1 op2 


o; 


// 








// 


nt 


C hg t 1 


16; 


// 








// 








// 








// 


n t 


bqp r i 


300 ; 


// 


n t 


mi npri 


-ioo; 


// 


n t 


m a x o r i 


3 o o ; 


/ / 


n t 


mt pr i 


250; 


/ / 


n t 


mxt i me 


54 o; 


// 


n t 


m a x r e s 


72oo; 


// 








// 



slope 1 changes at this 
time (seconds), 
number of bits to shift 
left = * 4 (si ope 1 ) , 
number of bits to shift 
left = * 1 (si ope 2 ) . 
if tchgtlor slool are 
changed/ chqtl must be 
changed also, 
chotl = tchql << slool 
back ground Priority 
lowest priority (value) 
max priority (value) 
max time priority 
max time = 9 min (sec*) 
max resource units, this 
value is = 2 minutes. 



/* 

* The following code is used to set a users priority 

* between -300 and 300. A value of 0 is the least 

* critical priority; w i t h - 3 0 0 being the most 

* critical. Any value over 0 is non-critic a 1 . A 

* process is critical if it has not received as 

* many resource units as dictated by t no policy 

* function. 

*/ 



schpri (nrp) 
s t rue t proc 

{ 



// schedule priority 

*nrp; // pointer to process that needs 
// a priority calculated. 



reenster struct proc *rp; // process pointer 



register int pri; / / 
register int resrl // 
int time; // current 



calculated priority 
resource units received 
time of this process 



} 



rp = nrp; 

time = rp->p<-time; 
resr = rp->o*-resr; 

if (rp->p*-flag&PSTK ! ! rp->p«-flag&TPWAlT) 

// if this process is already a back ground 
// process or it is waitino for terminal I/O 
pri = bgpri; // priority = back q r o u n d 

else { 

if (resr < 0) // has the resource count 

// gone over 32767. there is no chance 
// of this happening with mxtime set to 
// its current value. 

{rp->peflaa =! P S T M ; pri = Dapri ; } 

// set PSTK flag and priority to ha 
else { 



if (time > = mxtime) / / if process has 
// been alive lonoer than mxtime 

if (resr > maxres) // if resource 
// count greater than maxres 

{rp->peflag =J PSTM; pri = bgpri;} 

// set PSTM and pri to bg 

else 

pri = mtpri; // max time priority 

else ( 

if (time < tchql) // if process 
// has been alive less than tchql 
pri = resr - (time << slopl); 

else 



} 

r e t u r n 



} 



pri = resr ~ chntl - 
((time -tchol) << sloo2); 
if (pri > maxnri) // if priority 
// is too 1 arne 

pri = maxori; // fix it 

else 



} 



i f 



(pri < mi npr i ) 
pri - mi npr i I 



/ / too small 
/ / fix it 



(pri); 



/ / ret urn priority 



to caller 
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